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We determined the genome sequence of industrial Saccharomyces cerevisiae strain NAM34-4C, which would be useful for bio- 
ethanol production. The approximately 1 1.5-Mb draft genome sequence of NAM34-4C will provide remarkable insights into 
metabolic engineering for effective production of bioethanol from biomass. 
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Saccharomyces cerevisiae plays a central role in industrial etha- 
nol production because of its ability to produce ethanol at high 
levels. S. cerevisiae NAM34-4C, a haploid strain derived from the 
industrial yeast strain KF-7 (1), possesses crucial traits required 
for industrial ethanol production from biomass, such as high pro- 
ductivity, thermotolerance, and acid tolerance. In addition, 
Wakamatsu et al. recently reported that S. cerevisiae NAM34-4C 
can produce a considerable amount of ethanol from D-lactate un- 
der acidic conditions (2, 3). There are no other reports regarding 
ethanol production from lactate by S. cerevisiae. Thus, S. cerevisiae 
NAM34-4C possesses a unique genetic background and useful 
genetic diversity for bioethanol production. To better understand 
the unique potential of S. cerevisiae NAM34-4C for bioethanol 
production, we determined its draft genome sequence by combin- 
ing two next-generation sequencing technologies, the GS FLX 
Titanium system (Roche Diagnostics, Switzerland) and the 
SOLiD 3 system (Life Technologies, Inc., Carlsbad, CA). To fill the 
remaining gaps, Sanger sequencing of the amplicons was utilized. 

The NAM34-4C genome was de novo sequenced with the GS 
FLX Titanium system to highly oversample the genome (31.2-fold 
coverage), with a total of 1,030,498 reads and the generation of a 
paired-end library, enabling the assembly of 539 contigs into 56 
"supercontigs" (scaffolds) using the GS De Novo Assembler soft- 
ware (Roche). A genome of 1 1.5 Mb was covered by 56 scaffolds 
(AT 50 scaffold length, 409,700 bases). Whole-genome resequencing 
analysis of the NAM34-4C genome was performed using the 
SOLiD 3 system to improve the sequence quality of the draft ge- 
nome, and 285,503,052 50-base reads were obtained. The SOLiD 3 
reads were aligned to the scaffolds by BWA (4), Bowtie (5), and 
SAMtools (6) to detect the sequencing errors in the scaffolds ob- 
tained from GS FLX sequencing. As a result, 1,730 nucleotide 
differences were revised between the scaffolds generated by the GS 
FLX Titanium system and the reads generated by the SOLiD 3 
system. We confirmed 389 of the intercontig gaps by PCR and 



closed them by Sanger sequencing of the amplicons with the 
3730x1 DNA analyzer (Life Technologies, Inc.). 

Gene prediction and annotation were performed using AU- 
GUSTUS software with the training set that is available for 
S. cerevisiae (7) and Exonerate software (8). The predicted pro- 
teins were searched against the curated open reading frames of 
the Saccharomyces Genome Database (SGD) (http://www 
.yeastgenome.org) (9) by BLASTp software (10), and matches 
were found for 5,696 proteins at an E value cutoff of 10~ 6 . 

Nucleotide sequence accession numbers. The nucleotide se- 
quences of the Saccharomyces cerevisiae NAM34-4C draft genome 
have been deposited in DDBJ/EMBL/GenBank under the acces- 
sion numbers BAUHO 1000001 to BAUH0 1000 154. The version 
described in this paper is the first version. 
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